[LLM:Feature] Add Segment Mode, Speed up metal llm for 30%-100% by jxt1234 · Pull Request #4543 · alibaba/MNN

jxt1234 · 2026-06-15T10:48:27Z

Description

Module

Type

Checklist

Commit message follows [Module:Type] Description format
Code compiles without errors
Tested on relevant platform(s)
No unrelated format or style changes included

wangzhaode · 2026-06-18T08:08:22Z

这个 PR 的优化方向很有价值，尤其是针对小模型 GPU decode 场景，RoPE 融合、减少 NC4HW4 来回转换、TopK/embedding/logit 拆分这些点都比较关键。

不过建议考虑把改动拆开提交/合并，降低 review 和回归定位成本：

先拆出通用基础能力：
- OpType_RoPE 及 CPU/Metal/OpenCL 后端实现
- Attention output_c4 / attnScale
- NC4HW4 LayerNorm / binary LayerNorm
- MUL_SILU
- TopKV 优化
- SharedGather / prearrange clone 相关能力
- converter 里的 layout 传播规则调整
这些能力是通用的，后续现有 torch -> ONNX -> MNN 导出路径也可以复用，建议单独配回归测试和性能数据。
再单独提交 SegmentLlm / safetensor workflow 路径：
- segment.py
- safetensors converter / workflow json
- decoder.mnn、embed.mnn、logit.mnn、topk.mnn 分段导出
- SegmentLlm runtime 加载和推理逻辑
这部分更像新的 LLM fast path，可以作为 opt-in 路径独立评审，重点验证模型覆盖、采样行为、配置兼容性和与现有 llm.mnn 路径的一致性。

这样拆分后，基础优化可以先沉淀到主路径中，也方便定位是否是 backend primitive、layout pass、converter，还是 SegmentLlm runtime 引入的问题。整体方向支持，但建议不要把通用底座能力和新的 segment 运行路径绑在一个大 PR 里一次性合入。

jxt1234 force-pushed the feature/llm_mini branch 2 times, most recently from 4418abe to bebbc9a Compare June 15, 2026 11:02

wangzhaode self-assigned this Jun 15, 2026

jxt1234 force-pushed the feature/llm_mini branch from bebbc9a to a58b9d6 Compare June 16, 2026 02:43

jxt1234 changed the title ~~[LLM:Feature] Support Segment Mode, currently only support metal backend~~ [LLM:Feature] Add Segment Mode, Speed up metal llm for 30%-100% Jun 16, 2026

jxt1234 force-pushed the feature/llm_mini branch from a58b9d6 to 13e553a Compare June 18, 2026 02:42

jxt1234 force-pushed the feature/llm_mini branch from 13e553a to 25fce5f Compare June 18, 2026 08:31

jxt1234 added 2 commits June 18, 2026 16:54

[LLM:Feature] Add segment operator runtime support

8ecf6b8

[LLM:Feature] Add segment converter and runtime

9c8fdb6

jxt1234 force-pushed the feature/llm_mini branch from 25fce5f to 9c8fdb6 Compare June 18, 2026 09:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[LLM:Feature] Add Segment Mode, Speed up metal llm for 30%-100%#4543

[LLM:Feature] Add Segment Mode, Speed up metal llm for 30%-100%#4543
jxt1234 wants to merge 2 commits into
alibaba:masterfrom
jxt1234:feature/llm_mini

jxt1234 commented Jun 15, 2026

Uh oh!

wangzhaode commented Jun 18, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

jxt1234 commented Jun 15, 2026

Description

Module

Type

Checklist

Uh oh!

wangzhaode commented Jun 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

wangzhaode commented Jun 18, 2026 •

edited

Loading